prediction size
Reliable Classification with Conformal Learning and Interval-Type 2 Fuzzy Sets
Fumanal-Idocin, Javier, Andreu-Perez, Javier
Classical machine learning classifiers tend to be overconfident can be unreliable outside of the laboratory benchmarks. Properly assessing the reliability of the output of the model per sample is instrumental for real-life scenarios where these systems are deployed. Because of this, different techniques have been employed to properly quantify the quality of prediction for a given model. These are most commonly Bayesian statistics and, more recently, conformal learning. Given a calibration set, conformal learning can produce outputs that are guaranteed to cover the target class with a desired significance level, and are more reliable than the standard confidence intervals used by Bayesian methods. In this work, we propose to use conformal learning with fuzzy rule-based systems in classification and show some metrics of their performance. Then, we discuss how the use of type 2 fuzzy sets can improve the quality of the output of the system compared to both fuzzy and crisp rules. Finally, we also discuss how the fine-tuning of the system can be adapted to improve the quality of the conformal prediction.
- Europe > United Kingdom > England > Essex > Colchester (0.04)
- Europe > Sweden (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
Conformal Ranked Retrieval
Xu, Yunpeng, Guo, Wenge, Wei, Zhi
Ranked retrieval refers to the process of retrieving and ranking documents from a document repository based on their relevance to a user's query. As the core component in Information Retrieval (IR) systems, its goal is to present the most relevant documents at the top of the search results list, making it easier for users to find the information they seek (Baeza-Yates and Ribeiro-Neto, 1999). Over the years, ranked retrieval techniques have been successfully applied to many real-life problems, including web search engines, recommendation systems, and question-and-answer platforms, significantly impacting our daily lives. While ranked retrieval algorithms have been extensively studied in both academia and industry, considering the uncertainty in their predictions is a relatively new challenge. As we increasingly rely on search engines for answers to a wide variety of questions, it becomes crucial to evaluate the reliability of these retrieved answers. Therefore, it is important to quantify the uncertainty of the results, determining whether they encompass all the desired documents and whether these documents are ranked in a reasonable order. The challenges, however, lie in measuring uncertainty for ranked retrieval algorithms and developing methodologies to control this uncertainty. This is particularly challenging due to the complexity of ranked retrieval systems, which typically consist of multiple stages, each with different optimization goals.
- North America > United States > New Jersey (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Information Technology > Information Management > Search (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Robust Conformal Prediction under Distribution Shift via Physics-Informed Structural Causal Model
Xu, Rui, Sun, Yue, Chen, Chao, Venkitasubramaniam, Parv, Xie, Sihong
Uncertainty is critical to reliable decision-making with machine learning. Conformal prediction (CP) handles uncertainty by predicting a set on a test input, hoping the set to cover the true label with at least $(1-\alpha)$ confidence. This coverage can be guaranteed on test data even if the marginal distributions $P_X$ differ between calibration and test datasets. However, as it is common in practice, when the conditional distribution $P_{Y|X}$ is different on calibration and test data, the coverage is not guaranteed and it is essential to measure and minimize the coverage loss under distributional shift at \textit{all} possible confidence levels. To address these issues, we upper bound the coverage difference at all levels using the cumulative density functions of calibration and test conformal scores and Wasserstein distance. Inspired by the invariance of physics across data distributions, we propose a physics-informed structural causal model (PI-SCM) to reduce the upper bound. We validated that PI-SCM can improve coverage robustness along confidence level and test domain on a traffic speed prediction task and an epidemic spread task with multiple real-world datasets.
- North America > United States (0.93)
- Asia > Japan (0.05)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- (2 more...)
- Transportation (0.95)
- Health & Medicine > Therapeutic Area > Immunology (0.94)
- Health & Medicine > Epidemiology (0.68)
- Government > Regional Government > North America Government > United States Government (0.46)
Deep Air Quality Forecasting Using Hybrid Deep Learning Framework
Du, Shengdong, Li, Tianrui, Yang, Yan, Horng, Shi-Jinn
Air quality forecasting has been regarded as the key problem of air pollution early warning and control management. In this paper, we propose a novel deep learning model for air quality (mainly PM2.5) forecasting, which learns the spatial-temporal correlation features and interdependence of multivariate air quality related time series data by hybrid deep learning architecture. Due to the nonlinear and dynamic characteristics of multivariate air quality time series data, the base modules of our model include one-dimensional Convolutional Neural Networks (CNN) and Bi-directional Long Short-term Memory networks (Bi-LSTM). The former is to extract the local trend features and the latter is to learn long temporal dependencies. Then we design a jointly hybrid deep learning framework which based on one-dimensional CNN and Bi-LSTM for shared representation features learning of multivariate air quality related time series data. The experiment results show that our model is capable of dealing with PM2.5 air pollution forecasting with satisfied accuracy.
- Asia > China > Beijing > Beijing (0.05)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
- Asia > Taiwan (0.04)
- South America > Chile > Araucanía Region > Cautín Province > Temuco (0.04)